reachable belief space
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.53)
7f2be1b45d278ac18804b79207a24c53-AuthorFeedback.pdf
We thank the reviewers for their insightful feedback. We address reviewer comments below and begin by situating the paper's intended contribution: Why is this our goal? POMDP planners incur the complexity of full, closed-loop planning only when necessary. V oI is "contrary to the core concept of POMDPs", V oI macro-actions expand the set of problems that can be efficiently What is not our goal? The primary critique of reviewers is the limited scope of our experimental results.
Covering Number as a Complexity Measure for POMDP Planning and Learning
Zhang, Zongzhang (University of Science and Technology of China) | Littman, Michael (Rutgers University) | Chen, Xiaoping (University of Science and Technology of China)
Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just ``covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.
- Asia > China > Anhui Province > Hefei (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)